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Abstract 

In many experimental settings, one is tasked with obtaining information 
about certain relationships by applying perturbations to a set of independent 
variables and noting the changes in the set of dependent ones. While traditional 
design-of-experiments methods are often well-suited for this, the task becomes 
significantly more difficult in the presence of constraints, which may make 
it impossible to sufficiently excite the experimental system without incurring 
constraint violations. The key contribution of this paper consists in deriving 
constraint back-off sizes sufficient to guarantee that one can always perturb 
in a ball of radius Sg without leaving the constrained space, with Sg set by 
the user. Additionally, this result is exploited in the context of experimental 
optimization to propose a constrained version of G. E. P. Box’s evolutionary 
operation technique. The proposed algorithm is applied to three case studies 
and is shown to consistently converge to the neighborhood of the optimum 
without violating constraints. 

Keywords: design measures for robustness, evolutionary operation, optimization 
under uncertainties, experiment design 


1. Introduction 


In most branches of science, one often encounters systems where the 
relationship between some experimental r esponse and a finite number of 


independent variables needs to be studied (Montgomery!. 120121: iMvers et al 


2009f) . and it is generally assumed that the response is a dependent variable and 


a function of the independent ones. Mathematically, the experimental quantity 
may be stated as the function / : R”“ —>■ R, while u G M"" may be used to 
denote the vector of independent variables u = (ui,..., u„„). One is then left 
with the task of identifying /(u). The identification may be either local or 
global, and is usually done by conducting a series of experiments with different 


Email address: gene.buninaronininstitute.org (Gene A. Bunin) 


Preprint submitted to Elsevier 


May 3, 2016 













values of u, observing the resulting /(u) values, and performing some sort of 
regression. Such procedures are typically used to: 


(i) construct data-driven model ap proximations of f f o r when / is d i fficult 


to model v i a firs t 
Mvers et ah . 2009tl . 


principles (|Jones et al.l . Il998t iMonteomervl 12012 


(ii) estimate the uncertain parameters of an already available model (|Box . 
1990t Chen &: Joseph , 1987 : Pfaff et aP . l2006t Quelhas et ah . 2013 ). 


(iii) explore how the function value changes so as to find conditions for 
which the value is mi n imized , maximized , or e qual to a certain 


Robbins fc Monrd . 119511) . 


quantity (IBox fc Wilson . 195ll : IConn et . 2009t Lewis et ah . 20001 : 


It is the case for many problems that the experimental space of interest 
is a box defined by the constraints uf < Ui < uf, % = where 

= (uf,...,and are the lower and upper limits on 

the independent variables, respectively. Such problems typically correspond to 
simple set-ups that do not possess major safety limitations, and where testing 
any variable combination in the experimental space is permissible. Obtaining 
knowledge about / is not diffic ult in such condit ions, and the traditional 
design-of-experiments techniques (IMontgomervl . 120121) are perfectly appropriate 
here. 

However, there still exists a fair share of proble ms - many of them 
corresponding to continuous or batch chemical processes ( Bunin . 2016h - where 
additional constraints enter to reduce the experimental space in a nontrivial 
manner. These constraints may be expressed as the Ug inequalities 

5j (u) < 0, j = 1, ...,ng. 

In some problems, the functions gj may represent experimental relationships 
that, like /, can only be divined empirically. It often happens that such 
experimental constraints are safety or economic limitations - they could, for 
example, represent an upper limit on the temperature in a continuous reactor, 
or a lower limit on the purity of a batch-produced chemical. Despite the violation 
of such constraints being highly undesirable, or even dangerous, there currently 
exists no easy-to-implement, theoretically rigorous method for guaranteeing that 
the perturbations carried out on the system satisfy these constraints. 

Notably, there does exist a fairly established literature on methods that 
suppose the existence of a parametric model approximation gm,j{u, 9) « 5 j(u), 
define 0 as the uncertainty set to which 9 belongs, and then attack the problem 
via probabilistic f ormulations by ensuring that Qm . -i(u, 9 ) < 0 with suffi c ientR 


high probability (iKall fc Wallace 


LSuring tnat Qm . ilu, ^ U witn sum c ientlY 
1994 iLi et aU 120081: lOuelhas et all 120131: 


Sahinidij . 2004 Zhang et ah . 20021) . However, while such methods 


are 


theoretically just and robust, they suffer from four major practical drawbacks: 


(i) the requirement of a parametric model. 
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(ii) the restriction that the uncertainty be parametric, and that 0 be known, 

(iii) the computational issues that arise with probabilistic constraints, 

(iv) the conservatism that results from the probabilistic constraints reducing 
the set of admissible u. 


Drawback (i) becomes debilitating when the system at hand is difficult to 
model, while (ii) is more problematic since many employed models are, often by 
practical requirement, simplifications and thereby prone to structural errors 
( Chachuat et all 20091) . Drawback ( iii) is likely to be significant when the 
models have many decision variables, many uncertain parameters, and are 
involved. Simplific ations, such as linearizing the model with respect to 6 
( Zhang et al. . 20021) . may be used to avoid this, but ultimately come with the 
loss of rigor that one would expect from an approximation. Finally, (iv) can be 
extremely problematic when the parametric uncertainty set is large - as may 


often occur in practice (jLi et al 


20081: iQnelhas et all. [ 20 H - since this may 


limit the perturbation options, with only a small collection of u being deemed 
“safe”. 

The methodology proposed in the present work avoids these difficulties while 
maintaining the rigor. Taking a model-free, back-off approach, we simplify and 
generalize the results of iBunin et al. ( 2014a ) to derive positive values, bj, that, 
for a given u*, allow us to state the guarantee 


gj{u*) < -bj ^ gj{u) <0, Vu G Be, 


( 1 ) 


where 


Be = {vL: ||U - U*||2 < ^e}- 


Verbally, this means that given a decision-variable set u* known to satisfy the 
constraints with some slack, one is able to provide a guarantee that the entire 
ball of radius 5e surrounding u* will satisfy the constraints as well, thereby 
allowing the user to perturb anywhere within this ball without fear of constraint 
violation. Despite being local, such a result is nevertheless very useful as it 
allows a high degree of freedom - a ball permitting perturbation sets of any 
geometry. As will be shown, the value bj will depend on the local sensitivities 
of gj around u*, but can nevertheless be computed without requiring much 
effort from the user. Conversely, 5e is the sole tuning parameter set by the 
user and represents, in some sense, the magnitude of perturbation considered 
as “sufficiently exciting” for identification given the particular problem. 

To date, this res ult has alread y been integrated into the SCFO experimental 
optimization solver ( Buninl . [20151) . where it is used to ensure accurate linear and 
quadratic regression, but it is expected that the generality of the result make 
it applicable to many algorithms and contexts. In this paper, its usefulness 
is illustrated for a much simpler o ptim i zation algorithm - the evo lutionary 
operation (EVOP) method of Box (IBoxl 1957t Box fc Draoeii Il969l) . As the 
original method searches to maximize an experimental function by perturbing 
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in a hypercube around the best known u, it is made coherent with the result 
here by ensuring that the cube lie inside Be, with u* then defined as the best 
known reference point. By forcing u* to always satisfy o, it thus follows that 
all exploration by the modified EVOP version satisfy the constraints. 

The remainder of this paper is organized as follows. The required 
mathematical concepts and the derivation of the appropriate constraint back-offs 
are presented in Section 2. Section 3 then provides a robust extension of m 
that accounts for noise/error in the function values, together with a general 
discussion of potential implementation issues. The constrained EVOP algorithm 
is presented in Section 4, and its effectiveness is illustrated for three case-study 
problems. Section 5 concludes the paper. 


2. Derivation of Sufficient Back-Offs 


So as to keep the forthcoming analysis relatively simple, the following 
assumption on the continuity and differentiability of gj is made. 

Assumption 1. The functions gj are continuously differentiable (C^) on an 
open set containing Be- 


This then allows for the definition of bounds on the sensitivities of gj . 


Definition 1. The local Lipschitz constants of gj are defined as any constants 
Kji satisfying 


- Kji < 


dui 


^ Vll € 


( 2 ) 


The existence of these constants follows from Assumption [T] and the 
boundedness of Be- They may be used to bound the violation of a given gj 
via the local Lipschitz upper bound- 


Lemma 1. Let Ua,Ub S Be- It follows that 


Tlu 

^ T ^ ^ Kji\ui,^i 

i=l 


Proof. See lBunin et al.l (j2014bf l. 


(3) 

□ 


Finally, the Lipschitz bound may be exploited by substituting Uq —^ u* and 
Ufe —>■ u in ([3]) to generate a Lipschitz polytope around the point u*. 


Definition 2. Let Cj denote the Lipschitz polytope of the constraint gj around 
u*, defined as the set 


= S u : 5j(u*) -1- Kj^\u^ - <1 < 0 L 


(4) 
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The Lipschitz polytope has two important properties that should be 
apparent by inspection: 

(i) u G n ,Be < 0 , 

(ii) 5j(u*) <0^ Cj ^ 0. 

The “double membership” of u G Cj and u G -Be in (i) is required to ensure 
both that u satisfies the upper bound of ([3]) and that the bound itself is valid 
to begin with, respectively. Property (ii) should be evident if one just considers 
u := u* when gj{u*) < 0. 

Furthermore, it is clear that the content (hypervolume) of Cj increases 
monotonically as 5j(u*) decreases - i.e., Cj admits more and more 
implementable points because of the terms \ui — u*\ being allowed to grow 
larger while satisfying the inequality. 

It is this observation that inspires the foundations of the present work, 
illustrated geometrically in Figure [TJ If gj{u*) can be forced to remain 
sufficiently low - i.e., if gj(u*) can be made to satisfy the constraint with a 
certain back-off - then one can always guarantee the existence of a non-empty 
Lipschitz polytope centered at u*. Additionally, because the ball Be is also 
centered at u* , it may be inscribed inside the polytope, with Property (i) above 
then sufficient to guarantee that all points in this ball satisfy the constraint 
5j(u) < 0. 

The required size of this back-off is easily derived by exploiting the 
Cauchy-Schwarz inequality. 

Theorem 1. Let Kj denote the vector {kji, Kjn^) corresponding to a given 
dg and u . Ij" bj {jg|j/‘»:jjj 2 , then the 'implicotzon m holds. 

Proof. Defining the quantity Aui = \ui — u*\ and the vector Au = 
(Aui, ..., Aun.^), let us restate the definition of the Lipschitz polytope as 

Cj = {u : gjivC) nj Au < 0 } . 

From the Cauchy-Schwarz inequality, it follows that 

kJAu < ||Kj|12||Au||2 = ||Kj||2||u- U*||2, (5) 

which then allows us to consider a (ball) subset of Cj: 

^3 = {u : 5i(u*) -b ||K:jj|2||u- u*||2 < 0}, 
where Cj C Cj is evident since any u satisfying 


must also satisfy 


+ IlMjbllu- U*||2 < 0 


gj{\i*) -b /«JAu < 0 
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Figure 1: Geometric illustration of the concept behind using back-offs to enforce the existence 
of an excitation ball Be- The back-off is chosen large enough so that a ball of radius 5e may 
be inscribed inside the Lipschitz polytope corresponding to the back-off. 


by virtue of ([5]). 

It is then sufficient to show that Be C Cj for the back-off specified. Because 
every \i& Be satisfies ||u — u *||2 < 5e, it follows that any point in Be will belong 
to Cj if the inequality 5 j(u*) -|- i5e||/«j||2 < 0 holds, which is precisely what is 
ensured by the back-off. Thus, Be ^ Cj C £j, with Property (i) of the Lipschitz 
polytope then leading to the desired result. □ 

It is natural to ask what one should do if, given 5e, u*, and (consequently) 
Kji, the condition gj{u*) < —(5e||/«j||2 does not hold. One solution consists in 
choosing a different, safer u* from the available measurements, maintaining 
the same 6e, and hoping that the back-off is satisfied for this new choice. 
Alternatively, one may keep u* the same and reduce Se- Assuming that 
gj(u*) < 0, it is not hard to show that for 6e sufficiently small the back-off 
will be met. Consider, for example, the reduction sequence 



together with the corresponding Be- 





Clearly, each reduced Be will be a strict subset of the previous, from which 
we may deduce that the valid for n := 0 will remain valid for all n, thus 
allowing us to upper bound the ||/«j ;||2 portion of the back-off and to conclude 
that (5e||Kj||2 may be made arbitrarily close to 0 by choosing n sufhciently large 
(^e sufficiently small), thereby resulting in a back-off that is satisfied by gj{u*). 


3. Implementation Issues 

On the surface, the derived result seems easy to implement as it simply 
states that if, given u*, one wants to perturb anywhere in Be without fear of 
constraint violation, one only needs to confirm that gj (u* ) < -Se\\Kj\\2, Vj. If 
this condition does not hold, one can then choose a different u* from the set of 
applied u or decrease 6e- However, there remain a number of implementation 
aspects that merit discussion. 


3.1. Setting of Lips chit z Constants 

Theorem [T] provides a robust guarantee only if it is assumed that the 
Lipschitz constants provided satisfy While one can argue that picking 

very large, conservative values is sufficient, this has the obvious performance 
drawback of increasing (5e||Kj||2 and thus bj, which could make satisfying 
Sj(u*) < —bj impossible. As such, one would prefer setting these constants 
in more intelligent ways. 

Apart from some sc a rce and limited result s in the global o ptimi zation 
literature ( Hansen et ah . 1992 : Stronginl . Il973l : I Wood fc Zhang . Ii99(tI1 . the 
problem of estimating Lipschitz constants effectively is still very open. Some 
techniques have b een p roposed in the recently submitted manuscript of 
Bunin fc FrancoisI ( 2016r) . and suggest mixing a priori knowledge about the 


functions gj, which may provide good initial guesses of the constants, together 
with data-driven refinements, making explicit use of the fact that for a 
sufficiently local or approximately linear region the Lipschitz constants differ 
little from the local function derivatives. In the case that parametric models 
(u, 6) are available, one could bolster the a priori knowledge with the robust 
model-based estimates 


:= max 
U G Be 
6» G 0 


^9m,j 


dui 

u,6» 


as the Lipschitz-const ant est i mates . This mix of techniques has been employed 
in the SCFO solver ( Bunin . l2015ll and has generally led to relatively robust 
performance in tes t prob l ems, although constraint violations may still occur 


from time to time ( Bunin . 2016ll . 


It is worth noting that the EVOP method proposed in Section S] appears to 
handle this issue very well, using iterative local linear regression to estimate the 
Lipschitz constants in a manner that requires no input from the user and yet 
avoids constraint violation completely. 
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3.2. Accounting for Noise/Error 

Enforcing ^^(u*) < —bj also relies on having accurate knowledge of gj{u*). 
While it is reasonable to expect that in most applications one will be able to 
either measure or estimate the values of these functions at u* , for experimental 
functions it is generally the case that a perfect measurement or estimation will 
be impossible, and that there will oi ily be access to t he cor rupted values, a*. A 
typical assumption ( Hotelline . 1941 : Marchetti et ah . 2010l : More fc Wild . 2011 
Mvers et alJ . 2009ll is that this corruption is additive - i.e., that 


=5j(u*) +w, 

with w a stochastic element for which the estimates of at least the mean 
and variance are available. In this case, it becomes possible to compute a 
high-probability boundi ng value, gAuf) < o *, using, in the most general case. 


Chebyshev’s inequality (IMore fc Wildl . 1201 if) , or something less conservative if 
better assumptions on the nature of the noise are avail able. More i nvolved 
techniques for computing q* are outlined in Secti on 4 of I Bunin et al.l (j2014all 
and in the recent work of iBunin fc FrancoisI (|2016r i. 


One may then work with the robust condition 'g* < —bj instead, since 
satisfaction of this condition implies the satisfaction of 5 j(u*) < —bj with a 
high probability. 


3.3. Accomodating Numerical Constraints 

So far, gj has been treated from a very general perspective, and has been 
assumed only to be a function over Be. However, there are plenty of 
constraints for which much more knowledge is available. In particular, when 
the constraint gj is a numerical function that can be evaluated by hand or by 
computer for any desired u G Be, tbe guarantee of perturbing in Be without 
incurring violation of the given constraint is fairly easy. Namely, it suffices to 
ensure that 


maxgj (u) < 0, 

since this trivially guarantees that all perturbations in Be are feasible for 
that constraint, without requiring Lipschitz constants or robust upper bounds. 
For some cases, it may be that gj is an involved nonconcave function, 
potentially making its maximization over a ball a numerically challenging (global 
optimization) task. When this occurs, one could attempt to take a concave 
upper bound or, if worse comes to worst, estimate its Lipschitz constants - a 
significantly easier task for a numerical function - and then employ the general 
(albeit conservative) result of Theorem [1] 

A special subclass of numerical constraints that is even easier to work with is 
that of the bound constraints, uf < Ui < , which are extremely relevant since 

they tend to occur in virtually any well-defined experimental investigation. By 
simple intuition and quick inspection, one should be able to see that employing 
a back-off of Se is sufficient for these bounds. Not surprisingly, it is possible to 







































arrive at this same result by Theorem [TJ as the vector for these constraints 
may simply be taken as the vector of n„ — 1 zeros with unity in the spot, 
thus leading to = ^e- 

One could, however, choose not to back off from the bound constraints since, 
given their orthogonal nature, one is always guaranteed to retain a full orthant 
in which one can perturb. Given with the feasibility of Sg, this means that one 
could ignore these back-offs and still retain 1/2"“ of Be for safe perturbation. 
For most cases, this fraction of Be still offers enough room for the user to 
accomplish what they aim to, such as estimating a derivative or locally exploring 
the function’s behavior. 


3 . 4 . Scaling and the Tuning of the Excitation Radius 

It is evident that the results provided are not scale-invariant, and that a 
ball of radius 5e could offer very poor perturbation in directions that vary over 
a much greater domain than others - geometrically, this may be seen as the 
problem of the efficiency of inscribing a ball inside a Lipschitz polytope. One 
solution could be to generalize the notion of the ball to an ellipse, and to rederive 
the result for this more versatile case. However, a simple approach found to work 
well has been to simply scale the independent variables so that each Ui varies 
between 0 and 1, in which case perturbing in a ball is reasonable. 

This aside, one may still have to make a nontrivial decision about what 5e 
to set. Ultimately, this choice should depend on the particular problem that 
the user is trying to solve, with all available a priori knowledge exploited to 
yield an appropriate choice. For example, if the goal is to explore as much 
of the experimental space as possible and to build a model that encompasses 
a large, perhaps even global, domain, it may be of interest to set 5e to be as 
large as possible. If the goal is to perturb just enough to estimate a derivative 
while avoiding significant corruption due to noise, one m a y wan t to choose 5e 
as a function of the expected signal-to-noise ratio (IBuninl . I20I5L §3.14). When 
information about the function’s curvature is available, it may be possible to 
choose 5e so as to strike a balanc e between the corrupt ion due to noise and 
the corruption due to nonlinearity ( Marchetti et ah . 20I0ll . Finally, the ad hoc, 
post-scaling choices of 5e ■= 0.01,0.05,0.10 have also often worked well, in the 
author’s experience. 


4. Feasible-Side Evolutionary Operation 

An evolutionary operation (EVOP) algorithm is constructed for the 
feasible-side solution of experimental optimization problems having the form 

minimize (/(u) 

U 

subjectto 5 j(u)< 0 , / = !,...,rig (6) 

Ui<Ui<uY, i = l,...,nu, 

where both (j) and gj will, in general, represent experimental functions. As 
in many experimental optimization problems of this type, there is a constant 
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trade-off between adapting the decision variables u so as to minimize (j) and 
perturbing them so as to learn more about the local or general natures of (/> and 
gj. Both tasks are necessary - one cannot optimize without perturbation, but 
one cannot optimize if one spends all of the experimental resources perturbing 
the system for knowledge, either. 

Two main reasons motivate this choice of application. First, the derived 
result of Theorem [T] is innately coherent with the direct-search algorithmic 
nature of EVOP, which iteratively perturbs in a local region and then shifts this 
region so that it is centered around the best found point. It is thus quite natural 
to let EVOP perturb in Be and then shift u* accordingly, which immediately 
provides the traditional EVOP with constraint satisfaction guarantees. The 
second reason has to do with th e practical usefulness of the constructed scheme. 
Ever since its inception in 1957 ( Box . 1957ll . EVOP has enjoyed great popularity 
in industry, largely because of its extreme simplicity. In addition to being simple, 
it would later be shown to be theoretically well founded as well, with the slight 
addition of a step-size rule allowing it to enjoy the gl obal conv e rgenc e properties 
of Torczon’s generalized pattern search algorithms (|Torczonl ll997T) . However, 
both its industrial and theoretical success has, for the most part, been limited to 
simple problems with bound co nstraints, and whil e one cou ld use the stan dard 
(i.e., penalty-function) methods ( Conn et al. . 20001 Ch. 14) (Fletcher, 1987 . Ch. 
12) to convert problems with gj con straints into the bou nd-constrained form for 


which EVOP is readily applicable ( Rutten et ah . 2015h . these approaches are 


ultimately not safe and can at most only offer the guarantee that the constraints 
are satisfied upon convergence. 

From this point of view, the algorithm proposed in this section is believed 
to carry great potential as a stand-alone contribution that, in addition to being 
effective for the problems tested, is also extremely easy to apply and requires 
minimal input from the user. 


4.I. Description of the Algorithm 

Prior to stating the algorithm, let us first make some knowledge assumptions 
so as to ensure its basic functionality. 

First, it is assumed that, for a given tested u, one measures both (j) and gj 
with additive white Gaussian noise: 

^ = (j){u) + Wj„ Wj,Af{0,a'^) 

= 93 (u) + , Wj - A/'(0, cr2). 

For simplicity, it will be assumed that (jj, and rr = (cji, ...,(Tn ) are known. 

It will also be assumed that a sufficiently “safe” initial u* is available, so 
that the 2n„ test points generated by perturbing the individual elements of u* 
by ±(5e, 


±(5e,...,<^), 
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K,<±<5e), 

are, together with u*, safe and satisfy the constraints. 
Scaled variables will be denoted with (') and defined as 


= 


Ui — ut 


uY-ur 


= 1, 


..., Tl^ 


This scaling step will be implicit in the algorithm statement that follows - i.e., 
we will switch between the scaled and unsealed variables as needed without 
explicitly including the affine operation above in the algorithm steps. 


Algorithm 1 (Feasible-Side EVOP) 


1. (Initialization) The initial reference point u*, the standard deviations 
and cr, and the decision-variable bounds u^, are provided. The 
excitation radius 0.5 > Jg > 0 is set by the user. 


2. (Evaluation at Reference Point) Apply u* to the experimental system to 
obtain (p* and g*. Define U := (u*)^, (p := (p*, G := [gj) • • • 

3. (Perturbation) For i = 1, ...,n„: 


(a) (Perturbing by ±(5e) Obtain u+ and u_ by perturbing the i**' element 
of u* by de and —Se, respectively. 

(b) (Upper Bound Constraint Check) If < uf, apply u+ to the 
experimental system to obtain the corresponding measurements (p 
and gj, denoted by (p'^ and g^, and augment U, </>, and G: 


U := 


G := 


tJ 

, cp ■= 

r 

(p 1 

L "b J 

[ J 



G 

1 



gt 


■■ 9n 


(c) (Lower Bound Constraint Check) If u-^i > uf, apply u_ to the 
experimental system to obtain the corresponding measurements (p 
and gj, denoted by (p~ and gJ, and augment U, (p, and G: 


u 

, cp := 

r 

(p 1 

~ T 

ui 

u-j 



G 

1 



5i ■■■ 9n, \' 
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(d) (Set Number of Points Tested) If 


0 < U* ± (5e < 1, 

set Si := 2. Otherwise, set Si := 1. 

4. (Local Linear Regression) Fit a linear model, 

riu 

i=l 

to the observed cost function values, with the coefficients 

recovered as the hrst n„ elements of the least-squares solution [U 
Set = (/3i, Pnu) as the estimate of the gradient of ^ at u*. Repeat 
the same procedure with the columns of G to obtain the gradient estimates 
Vffj for j = l,...,ng. 

5. (Lipschitz Constants) For each i = 1, and j = 1, ...,ng, set 


Kji 


M + 6 ^, 

dUi Side 


where dg*/dui denotes the element oiVg*. 

6. (Nearly Active Constraints) Define the index set 


■ f j : +3 CTj > -5e||/«j|l2 

' (for at least one measurement gj in the j**' column of G 

as the index set of nearly active constraints. 

7. (Approximation of Lagrangian) Defining the gradient of the Lagrangian 
at u* as 

"g 

VL(A)=V<^*+^A,Vg*, 

approximate the corresponding Lagrange multipliers. A*, by the solution 
to the constrained least-squares problem 

A* := argminimize VL(A)^VL(A) 
subject to Aj > 0, Vj 

Xj = 0, Vj ^ Ja- 
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8. (Choose New Reference) Set as u* the point in TJ that 

(i) has the smallest VL(A*)^u value, 

(ii) has the corresponding upper bounds g* := g* + Saj that satisfy 
g* < -Se\\Kj\\ 2 , Vj. 

If no such u* exists, maintain the same reference point as before. Define 
U := (u*)^, 4> ■= ^*1 G := [g* • • • and return to Step 3. 

A number of remarks are in order: 


• As the scaling reduces the variable space to a unit hypercube, there is no 
need for 5e to be set above 0.5, as this would preclude the existence of a 
feasible Be- 


• The perturbation scheme used here differs from that of the traditional 
EVOP methods, which use a 2”’" factorial scheme to define the test points 


( BoxL 1957 : Box fc D raneil 1969), an d is in this sense more similar to a 


coordinate search ( Conn et al. . 20091 Ch. 7). The rationale for doing 


this is that requiring 2 nu perturbations leads to better efficiency as 
increases (linear as opposed to geometrical), requires fewer perturbations 
for Uu > 2, and appears to be sufficient for the given tasks. 


• As suggested in Section 13.31 no back-offs are used for the bound 
constraints, ensuring instead that they are never violated during the 
perturbation phase. 


• It is not difficult to show that the derivative estimates obtained by linear 
regression of the local data set are, as a result of the orthogonality and 
symmetry of the perturbations, identical to the difference quotients 

^ ^ gt - 9j 

dui 25e 


when Si := 2, or to either 




9j 9j 

—- or — 


when Si := 1. It is well known (see, e.g., Daniel fc Heerern^ (ll95Clll l that 


the standard deviation for such estimates is equal to cjj'/2l{si5e)- In 
adding six times this quantity to the absolute value of the derivative 
estimates, we use half of that amount (“three sigma”) to account for 
potential errors in the estimates due to noise, and the other half to 
compensate for the fact that the Lipschitz constants only match the 
derivatives very locally, and may actually be larger for the Be at the future 
cycle (i.e., the Lipschitz-constants estimation is always one cycle behind). 
This is important since the new reference is always chosen subject to the 
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restriction that the corresponding g* < —dellII 2 , Vj, where the back-off 
is defined using the Lipschitz constants at the current cycle. This then 
encourages the guarantee of safe excitation around the new reference. 

• Adding Sctj to the constraint measurements in Steps 6 and 8 robustifies 
the scheme against measurement noise, resulting in the upper bounding 
values 'Qj := gj + Saj that bound the true function values with a high 
probability of about 99.85%. 


• Because the problem addressed is constrained and uses Lipschitz bounds to 
guarantee robust constraint satisfaction, it follows that the scheme could 
converge suboptimally if a llowed to get too c lose to a constraint. This 
iss ue was first observe d bv I Bunin et al. ( 2011 1 and dealt with rigorously 
by Bunin et al.l ( 2Q14cll , and represents a main drawback of using Lipschitz 
constants for constraint satisfaction. So as to avoid this issue here, 
it has been proposed to minimize an approximation of the Lagrangian 
instead of simply minimizing <j). Note that when u* is far from any 
constraints, the scheme simply sets A* := 0 and the two objectives are 
equal. However, when some gj start to get close to activity, the scheme 
defines the objective as a trade-off between minimizing (j) and lowering 
the nearly active constraint values, which tends to allow the algorithm 
to “slide off” the nearly active constraints in application. The theoretical 
rigor of this scheme is not entirely clear, but it may be seen to be consistent 
upon convergence, in that finding A* such that VL(A*) Ri 0 implies that 
u* is a constrained stationary point, provided that the gradient estimates 
are not too erroneous. 


4-2. Application to Case-Study Examples 

Three pro blems are chosen from the ExpOpt database of test problems 
( Buninl . [ 2 OI 6 L Problems P2, P3, P6) so as to illustrate the performance of the 
algorithm when applied to realistic case-study scenarios. The first problem 
consists in varying the feed rate and temperature of a Williams-Otto plant 
so as to maximize its steady-state profit while honoring a n upper lim i t on a 
noxious product, and is originally tak en from the paper o f Marchetti ( 2013h. 


The se cond, adapted from the works of Centric et al.l ( 1999h and Francois et al 


deals with minimizing the operating time of a polysterene batch reactor 
while honoring a minimal molecular weight specification. Here, two “switching 
times” that help define the temperature profile of the bat ch are taken as the 
decisio n variables. Finally, the third problem comes from Francois &: Bonvinl 
( 2 OI 3 II and seeks to maximize the steady-state production of a continuous 
reactor by varying two feed rates subject to two experimental constraints. 
Two-dimensional problems have intentionally been chosen as they allow for 
easy visualization and interpretation of the results. The different problem 
specifications are provided in Table [TJ 

Inputting the specifications into Algorithm 1, the algorithm’s performance 
is tested for the ad hoc choice of 5^ '■= 0.05. The results are given in Figure [2] 
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Table 1: Problem specifications for the case-study examples. 


# 

Initial u* 

O'cj) 

O’ 



P2 

(3.5,72) 

0.5 

5-10-^ 

(3,70) 

(6,100) 

P3 

(242,945) 

60 

iF 

(50,600) 

(450,1000) 

P6 

(14.5,14.9) 

0.1 

(0.03,0.03) 

(1,1) 

(50,50) 


and show that the algorithm succeeds in exploring the decision-variable space 
while satisfying the constraints - not a single violation is observed. By contrast, 
suppose that one tried to apply the same algorithm to Problem P2 but without 
the application of the safety back-off derived in Theorem [1] This is done by 
substituting —5e||Kj||2 —?> 0 in Steps 6 and 8 of Algorithm 1. The result is 
given in Figure |3l Again, it is seen that the use of the Lagrangian leads to the 
algorithm successfully converging to a neighborhood of the optimum. However, 
constraint violations occur repeatedly along the convergence trajectory. 

The convergence properties of the algorithm are of interest - in particular, 
one notices that the algorithm converges to a region that is relatively suboptimal 
in Problem P6. This is due to the geometry of the problem, the conservatism 
of the back-offs, and the size of the noise in the constraint measurements. In 
other words, the algorithm cannot make further progress while robustly (to Saj ) 
satisfying the specified back-offs. While proving convergence to an optimum 
in a mathematically rigorous manner is outside the scope of this paper, for 
functions one should expect the algorithm to converge properly if both the noise 
and back-offs go to 0 asymptotically, as Jg i 0 would lead to the line ar regression 


conve rging to the true function gradient in the absence of noise (jConn et al 
120091 §2.4), with the removal of noise also removing the conservatism introduced 
by using the upper bounds gj + iaj. To test this hypothesis empirically, let us 
use a modihed implementation of Algorithm 1 where 


Se{k) := SelVk, 

a^{k) := G^l\fk, 

cr(k) := crj^fk^ 

with k starting at 1 and being incremented by 1 after each cycle of the 
algorithm. The results are presented in Figure S] and largely confirm one’s 
expectations, with feasible-side convergence very close to the optimum achieved 
for all problems. Although the neighborhood of convergence appears to be far 
from the optimum in Problem P3, this is due to the geometry of the problem - 
because both the cost and constraint functions are close to linear in the region 
of the optimum, there is a wide range of points that achieve a low cost and are 
close to stationarity all along the constraint. 
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Figure 2: Plots of the decision variables and cost function values for Problems P2 (top), 
P3 (middle), and P6 (bottom). Green points in the decision-variable plots denote the true 
optimum, red points denote the individual experiments, and blue points denote the reference 
(best) point at the final tested EVOP cycle. Red regions denote the infeasible (not safe) 
portions of the operating space as defined by the constraints. In plotting the cost function 
values, the blue lines denote the true (noiseless) values, while the constant black line denotes 
the value at the optimum. 
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Figure 3: Illustration of EVOP performance for Problem P2 when the back-off is not employed. 


5. Closing Remarks 


This paper has contributed to the problem of obtaining information about 
an experimental function in the presence of general constraints, and we 
have derived a rigorous constraint back-off that, when satisfied at some u*, 
guarantees that the user may perturb anywhere in the ball of radius dg around 
u* without incurring constraint violations. This result is believed to constitute a 
useful contribution to scientific problems dealing with constrained experimental 
spaces, as it offers a straightforward way to guarantee safety while exploring the 
experimental space. 

While the results simplify greatly for numerical constraints, in the case 
of general (likely experimental) constraints one is forced to obtain local 
sensitivity bounds - Lipschitz constants - for the constraint function in order 
to compute the appropriate back-off. To do this well may be challenging, 
but relevant methods do exist and may work quite well in certain contexts 
( Bunin fc: Francois , 2016 ). 

The feasible-side EVOP optimization algorithm has provided an interesting 
application of the derived back-off result, and has been shown to work very 
well with respect to constraint satisfaction and convergence for three different 
case studies. While more involved than the traditional EVOP procedure, the 
algorithm nevertheless retains its ease of implementation, requiring the user 
to only set 5e prior to applying it to a problem. It goes without saying that 
numerous performance improvements are possible but have not been the focus 
of this paper. One could, for example, use all of the measurements obtained 
since initializ ation to filter out the noise, an idea routin ely employed in the 
SCFO solver ( Buninl . [20151 §3) and recently proposed bv I Gao et al. ( 2016 1 in 
the context of multilayer real-time optimization. One could also attempt to 
choose search directions in a more intelligent manner, such as what is done in 
some derivative-free methods ( Conn et al. . 20091: Lewis et ah . 200r)[l . Clearly, 
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Figure 4: Plots of the decision variables and cost function values for Problems P2 (top), P3 
(middle), and P6 (bottom) when 5e, cr^, and cr are gradually reduced over the course of 
operation. 
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one could generalize the method to handle noise that is not white Gaussian, as 
well, although the computational effort may increase due to a lack of closed-form 
expressions. 
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